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We compute the stationary in-degree probability, Pi„(k), for a growing network model with di- 
rected edges and arbitrary out-degree probability. In particular, under preferential linking, we find 
that if the nodes have a light tail (finite variance) out-degree distribution, then the corresponding 
in-degree one behaves as k~ 3 . Moreover, for an out-degree distribution with a scale invariant tail, 
Pout(k) ~ k~ a , the corresponding in-degree distribution has exactly the same asymptotic behavior 
only if 2 < a < 3 (infinite variance). Similar results are obtained when attractiveness is included. 
We also present some results on descriptive statistics measures such as the correlation between the 
number of in-going links, D in , and outgoing links, D out , and the conditional expectation of D in 
given D ou t, and we calculate these measures for the WWW network. Finally, we present an applica- 
tion to the scientific publications network. The results presented here can explain the tail behavior 
of in/out-degree distribution observed in many real networks. 
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INTRODUCTION 



Barabasi and Albert pQ discovered that several net- 
works in nature have a strange topological characteris- 
tic: they have a scale-free [U [3 H] degree distribution, 
P(k) ~ k~ a , where the degree of a vertex is defined as 
the total number of its connections. Nowadays, this em- 
pirical behavior is confirmed in a great number of com- 
pletely different empirical networks, from biological net- 
works to e-mail networks, including scientific publication 
networks. In [T] they also proposed a model (B-A model) 
for explaining this behavior. The model can be formu- 
lated as follows: 1) start with a network with N nodes, 
connected by j edges in an arbitrary way, and 2) at each 
time step a new node, with to edges, appears, and each 
of edges connects to the existing nodes according to some 
probability law, w. The probability that a new edge at- 
taches to a node with degree k, ir k , was defined [1] as 
proportional to the degree of the node. In particular, 
they showed that with this attachment law, 
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where N k is the number of nodes with degree k, the sta- 
tionary degree distribution has a power law tail, P(k) ~ 
fc~ 3 . In [5] they computed the stationary degree prob- 
ability (not only the tail behavior) or limit degree dis- 
tribution for a model similar to the B-A one, but for a 
generalization of the preferential linking attachment law. 
They introduced a new parameter, the attractiveness, A 
(in their case A > 0), and defined the attachment law as: 
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where N k n is the number of nodes with in-degree equal 
k. They found in this case that P(k) ~ k~ (2+A / m \ be- 
ing more flexible for comparing to empirical networks. 
Typically, degree distribution of real networks satisfy, 
P(k) ~ fc" Q with 2 < a < 3. But the B-A model and 
similar ones |5J, no matter which is the attachment law, 
have a mayor drawback, the number (to) of edges that 
arise from new nodes is a fixed number. In almost all real 
networks, the new nodes do not have the same number of 
edges. On the other hand, the number of edges of a ran- 
dom selected new node (from a real network) is a random 
variable. So, in order to be more realistic, we will study 
the behavior of the B-A model when new nodes with a 
random number of edges appear, but in the more general 
context of directed growing networks. In this context 
new questions arises. 

Directed networks are characterized by the fact that 
the edges are directed (arrows), each node has edges that 
point at it, and others that born in it. The in-degree of 
a node is defined as the number of incoming edges, and 
the out-degree as the number of its outgoing edges. The 
most studied directed growing networks have been the 
WWW network HI [T^] , and the scientific publications 
network |6J. In the first one, each node represents a web 
page and the hyper-links (references to other web pages) 
represents the directed edges or links. In the second one, 
each paper is a node, and its references the directed links. 
In this last case, the in-degree distribution represents the 
distribution of citations for a random selected paper, and 
the out-degree distribution represents the number of ref- 
erences of a random selected paper. Empirical directed 
growing networks follow in general one of two possible 
behaviors. In the first case they have an out-degree expo- 
nential distribution, P ou t(k) ~ a k (0 < a < 1), or an out- 
degree distribution taking finitely many values, associ- 
ated with an in-degree one distribution with a power law 
tail Pi n (k) ~ k~ a where typically a ss 3. In the second 
case the out-degree distribution satisfies P ou t(k) ~ k" 13 , 
and is associated with Pi n (k) ~ k~ a with a ~ ft. Exam- 
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pies, such as biological, WWW, or communication net- 
works, can be found in |2ll3j[4j[9]. 

In this paper, we address the question of why the em- 
pirical growing directed networks show this strange gen- 
eral behavior for the tail of the in/out degree distribu- 
tions. We study a particular growing network model (a 
generalization of pQ to be precise), obtaining the sta- 
tionary joint in-out degree distribution, Pi n ,out(j '> k), and 
some of its derivatives, such as the marginal distribution, 
Pi n (k), the covariance, and the conditional expectation 
of the number of in-links given the number of out-links. 
In particular, studying in detail Pi n (k), we prove (for the 
model presented here) that it is expected to observe the 
in/out tail behavior reported for real networks [HOB]. 
Finally we present an application to the most "pure" (ex- 
tremely few double arrows) growing directed network: 
the scientific publication network. In this application, 
we show the relevance of having an expression for the 
limit in-degree distribution (Pj„(fc)) for an arbitrary out- 
degree one (Pout(k)). 



small correction compared with the "intrinsic" variabil- 
ity. This assumption is at the core of our model. In a 
real network the "intrinsic" variability is given by differ- 
ent reasons that are hard to know (why does a randomly 
selected scientific paper has a number of references with 
some particular distribution?), but typically the problem 
of trying to understand it is not a mayor question. 

Now, we describe the growing network model: 1) ini- 
tially the network consists of N nodes connected in a 
given arbitrary way, 2) at each time step, say time step 
n + 1, a node with D out outgoing-edges appear, where 
D out is a random variable ( P(Dout = k) = 1), and 3) 

fcsEN 

each new directed edge points out to an existing node 
with some probability law Tr n+ i (uniform, preferential 
linking, etc.). Fig. [ll shows an scheme of the model. If 
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II. GROWING DIRECTED NETWORK MODEL 

Before describing the model, it is important to remark 
that real directed growing networks have in general a con- 
siderable asymmetry between the in-links and out-links 
of a node. For example, nobody will care much about 
how many references (out-links) an own paper has, but 
people are interested in the number of cites (in-links) 
that their own paper has. That is why we are going to 
treat the out-links from a new node and the in-links in 
a completely different way. In particular, a node can 
receive (with positive probability), a connection from a 
new node at any moment, but typically a node can not 
change who their pointers (the set of nodes it is point- 
ing to) are. This is very clear in the scientific publica- 
tions network. In this network the in-degree distribution 
has been extensively study 015], whereas the out-degree 
distribution has been poorly reported [10J E] . Neverthe- 
less, in the case of the WWW network, the outgoing links 
(hyper-links) can change at any moment and new hyper- 
links can be aggregated or old hyper-links can be redi- 
rected. In [7, 8 they proposed some models for describing 
this network taking into account the characteristics men- 
tioned above. However these models do not consider that 
the new nodes have a particular out-degree distribution, 
i.e. the models are constructed under the hypothesis that 
new nodes have a fixed number of out-links. The mayor 
problem of both models is that the nodes (webpages) do 
not have a controlled number of out-links, they can have 
a huge number of them which does not seem realistic. 
Our strategy for modeling these networks is completely 
different to the ones proposed in [Tj [5J, for us, most of 
the variability in the number of out-links is explained 
when the node appear, defined as "intrinsic" variability, 
and not as a product of updating nodes. We think that 
in many real networks the updating of nodes can give a 



FIG. 1: Scheme of the growing network model. In each 
temporal step a new node (shown in black) with D ou ± out- 
links appear; these links point towards existing nodes. D out 
is not a fixed number, on the contrary is a random vari- 
able. The degree vector at time 0, and 1 is: Nq = 
(1,4,0,0,1,0,0,0,...,0,...), A?! = (1,4,1, 0,0,1, 0,0,..., 0,...). 

7r n+ i is an arbitrary function that depends on the de- 
gree vector at time n, N n = (iV 1 , N%, ... , N%, ... ) and/or 
Ni n .n (N ou t 7 n), then the growing network model, de- 
scribed above is a Markov chain taking values in Nq or 
N x Ng with transition probabilities given by 7r„ +1 . In 
this work (under the Markovian hypothesis) , we show an 
easy way to compute stationary (in/out) degree probabil- 
ities for arbitrary 7r n+ i. An important part of this article 
is devoted to the study of the model under the law: 
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and in Section 2.4 we show some results under different 
7r's. The law of eq. [3] corresponds to preferential linking 
on degree with attractiveness. This probability is well 
defined for values of A greater or equal to -B, where 



B = min{k 
k 



P(D out = k) > 0}. 



(4) 



For this attachment law, the model is in fact an exten- 
sion of the Albert-Barabasi model, although in this case 
D out is a random variable with an arbitrary distribution, 
P(D out = k) with k € N, and the edges are directed. The 
limit (stationary) in-degree distribution and the limit de- 
gree distribution have not been reported, even for simple 
cases as D out taking values 1 and 2, with probabilities pi 
and 1—pi respectively. Moreover, even in the undirected 
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case, it is not known if in general the limit degree distri- 
bution {P{k)) satisfies a superposition principle (linear 
combination) . 



A. Stationary Probabilities 

The number of out-links does not depend on time (see 
Appendix A for additional details), therefore, the limit 
out-degree distribution satisfies P ou t(k) = P(D ou t = k). 
Note that the out-degree distribution is defined a priori 
(in accordance with the specific network), imposing in 
this way the asymmetry mentioned before between the 
in and out links. We are interested in obtaining the limit 
degree distribution, P{k), and the limit in-degree one, 
Pi n (k). In order to compute this last probability distri- 
bution, we first compute the stationary joint degree and 
out-degree distribution, Pd eg ,out(j,k). If the network is 
distributed according to the stationary probability, then 
the probability that a randomly chosen node has k out- 
links and j total links, D = (D,D out ) = {j,k), is given 
by: 



Pdeg,out(j, k) = P(D = (J, k)) = lim 
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where out n is the number of nodes with h total links 
from which i are out-links at time n. The last equality 
holds by the Law of Large Numbers. Clearly, the joint in- 
out degree can be computed from this last one, Pin t out(j— 
k,k) = Pdeg,out(j,k), and also the in-degree and degree 
probability taking marginal distributions. 

_ ^ei,o U t,n+i. d 5 pendB ° n: ^ N deg,out : n, and 2 ) the tran- 
sition probabilities, itdeg,out,n+i- As it is usual for Markov 
chains, we associate to the transition probabilities of this 
chain some random variables that we now describe. In 
the first place, there is the out-degree, D out , of the new 
node. Secondly, we consider at each time n + 1 a se- 
quence of independent and identical distributed bivari- 
ate random vectors {Zi}, taking value {j,k), j,k G N, 
with probability ir^ eg out n+1 , which depends on the state 
of the chain at time n. This way, the growing network 
dynamics can be written as: 
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where 



D out 

£ S Z i =(j-l,k) - S Z i =(j,k) for i > k 
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(6) 

5 Dmt=3 - E % i= 0',i) for 3 = k 



The random vector Zi indicates to which type of node 
the i link (of the new node) is pointing to. For example, 
if Z\ = (3, 2), a new link is pointing to an existing node 



with 2 out-links and 1 in- link (or 3 total links). Clearly, 
in order to have a good representation of the growing 
network process, the probability law of Zimust be equal 
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as we impose. Equations 



5l and ^ can be 
T. a new node 



to 7r; 

read in the following way: if at time n 
with Dout = m out-links is aggregated, then N^™ out n+1 
grows by one, and m components of the degree vector 
undergo a "shift" . As the network continues to grow, the 
goal is to find whether there exists a limit distribution 
for the in-out degree. For very large values of n, given a 
randomly selected node, what is the probability that this 
one has j links, of which k are out-links, Pdeg.out{j,k)l '. 
The following property shows a way of computing 
Pdeg,out(j, k) which has interest on itself. 
Property: Pdeg,out(j, k) is the solution of: 
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t (j,k) = (Aii k /e n ) Vj>fceN, 
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where n is the event that imposes that the empirical dis- 
tribution at time n is equal to the stationary distribution, 

N h,i 

i-e. e„ = { = Pd eg ,out{h,i) VM e N}. 



The preceding property says that if the process at time 
n is distributed according to the stationary probability, 
Pdeg,out, it will remain there in expectation. This tech- 
nique for finding stationary probabilities seems much eas- 
ier (see Appendix B) than previous approaches [T|l5|[T8]. 

Using the property mentioned above and eq. [6] it is 
easy to see that the stationary joint deg-out distribution, 
Pdeg,out, satisfies: 
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- EkPout(k). These 

k=0 

two equations contain all the information about the limit 
joint in-out degree distribution, being a crucial result in 
this paper. It is important to note that since we have 
conditioned on the fact that at time n the process is 
distributed according to the stationary probability, the 
link attachment probability does not depend on time. 
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denotes the stationary probability that a 



new link (from a new node) point to an existing node 
with j — k in-degree links and k out-degree links. Under 
preferential linking on degree with attractiveness (eq. |3| , 
the stationary attachment law remains: 
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where (D) = kPdeg(k)- The marginal distribution of 

k = l 
k . 

Tr k = J^^deo out' ^ S * ne stationary version of TT k +1 

presented in eq. [3] Replacing eq. [9] in eq. [5J and using 
D) = 2 {D ou t) (for each new node with k out-links, the 
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total degree increases by 2k) we obtain: 
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Pout(k), (10) 



where *(a,6) = ^j^ 1 = / X ^(l-i) 6 " 1 * (Beta func- 
tion), and <5 = A/(D out ). From eq. [To] taking marginal 
distributions is trivial to obtain: 
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(11) 

Eq. [TT] shows the joint stationary in-out degree probabil- 
ity, the degree distribution and the in-degree distribution. 
In the stationary regime (for the probability) the propor- 
tion of nodes with j in-links and k out-links (eq. 
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_(a)), 

depends on the attractiveness, and on the out-degree dis- 
tribution through two quantities: (D out ) and P ou t(k). 



The same happens for P(k) and Pi n (k). Eq. 11 (b) shows 
the stationary degree probability for arbitrary out-degree 
distribution (see Appendix B for a simpler derivation). 
Note that just by replacing P ou t{k) by <5fc =m (this means 
a non-random D out and equal to m) we obtain the known 
result [5] for undirected networks. Eq. 11 (c) constitutes 
one of the main results of the paper. Replacing P ou t(k) 
by the empirical value, we can check whether the model 
is adequate for the network under study. Moreover, it 
is possible to see that a superposition principle does not 
hold, either for P(k), P in (k), or P ln ,out{k,j). They can- 

oo 

not be written as P(k) = YlP<mt(j)Qj(k), where Qj(k) 
. . j=1 

is the stationary probability for a fixed number j of out- 
links. The superposition principle will be valid for the 
three limit distributions only when the attractiveness 
vanishes (preferential linking). In this way, the prefer- 
ential linking generalization (the inclusion of attractive- 
ness) introduced in has the advantage of enlarging the 
power exponent values of the degree distribution, with 
the drawback of loosing a superposition principle. If we 
allow the appearance of new nodes with zero out-links 
(P(D out — k) — P ou t(k) with k £ N Q ), then the results 
presented in equations 11 (b) and (c), still hold after 



switching the initial index in the summation from 1 to 
and taking k £ N a = N U {0}. In this last case, the 
attractiveness must be greater o equal zero (see eq. |4| . 



B. Descriptive Statistics 

Before trying to describe a real network by a model, 
some first checks are recommendable. One typical mea- 
sure that has been extensively used is the clustering coef- 



ficient, that is a measure of how connected the neighbors 
of a node are. We are going to discuss much simpler de- 
scriptive measures that also serve as tools for looking for 
the "best" model. Therefore, it is important to have ana- 
lytical devices for comparing with real data in the search 
of a good model. 



1. Covariance and conditional expectation 

A measure of dependence between the in-degree and 
the out-degree can give an idea of which is the attach- 
ment law that better describes the empirical data. The 
covariance between D out and Di n , Cov(Di ni D out ) — 
(D in D out ) — (Di n ) (Dout) is an adequate statistical mea- 
sure for this purpose. For example, in the case where 
the law of attachment is preferential linking on in-degree 
(eq. |2| this measure is obviously zero. For the case stud- 
ied in detail here, preferential linking on degree (eq. |3j, 
it is straightforward to see that the covariance between 
D ut and Di n in the particular case 4 = 0, satisfies the 
following equation: 

Cov(D ln , Dout) = \cov(D, D ut) = Var(D ou t) (12) 

where Var(D out ) = Cov(D ou ti D out ). The covariance is 
always positive or zero (for non random D ou t), as it is 
expected for this type of attachment law. Eq. [12] in- 
stead can be written in terms of the correlation, r = 



Cov(D in ,D ou t) 
^/Var(D irl )Var(D out ) 



in the following way: 



! Var(D ut) 
Var(D m ) ' 



(13) 



It is surprising that the correlation satisfy this simple re- 
lation between the standard deviations, r is the ratio be- 
tween a ou t (\JVar(D ou t)) and a in (^Var(D in )). Since 
the correlation coefficient is always less or equal 1, we 
obtain the following inequality: 



Var(Dout) < Var(D in ). 



(14) 



Although it is very easy for real network to estimate the 
variance of the number of out and in links, and also the 
covariance (or correlation) between the in and out-degree, 
these measures are not typically reported (see Appendix 
C for results on the WWW network) . 

On the other hand, the first right term of the covari- 
ance always satisfies: 

(D m D out ) = J2 k ( D *n/D ou t = k)Pout(k), (15) 

fceN 

where (Di n / D ou t — k) is the conditional expectation of 
the number of in-links given that the node has k out- 



links. From equations 12 and 15 it is very easy to see 
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that: 
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The relationship between (Di n /D out ) and D ou t can be a 
second check to make before modeling. For a real network 
this can be done in the following way, choose all the nodes 
that have a number D ou t of outgoing links, and take the 
mean of the number of in-links over this set of nodes. If 
the conditional mean is equal to D out for all values of 
D ou t, then this is an indication that the model can be 
adequate. 



For non null attractiveness it is hard to obtain an- 
alytical results, nevertheless, we compute numerically 
(Di n / Dout) for different values of D out and attractive- 
(a) and the definition of conditional 
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ness. From eq. 
expectation, it is easy to obtain: 



(Din/Dout) — 



Dout + A, 3 + 5) 
A,2 + S) ' 



(17) 



Fig. [2] (a) shows the numerical results of (Di n /D out ) 
based on eq. |17| For any value of the attractiveness and 
(D ou t), the conditional expectation follows a linear rela- 
tion with D ou t- 

(D m /D out ) = f(A, (D out ))D out + g(A, (D out )). (18) 

The slope, f(A, (D out )), and the intercept, g(A, (D out )), 
of this straight line satisfies: 



Urn f(A, {Dout)) = 

Urn g(A, (D out )) = (D out ), 



(19) 
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as it is shown in Fig. [2] (b) and (c). For positive values 
of attractiveness the slope is smaller than one, going to 
zero as the attractiveness goes to infinity. In the case 
A — > 00, D in and D out are independent (always with 
the same expectation). Finally, for negative values of 
A the slope is greater than one. Studying the empirical 
relationship between (D in /D out ) and D out can give some 
insight on the model. Moreover, if this relationship is 
linear, from Fig. [2] (b) and (c), it is possible to have 
a first estimation of the attractiveness. In Appendix C 
we show the statistical measures presented here for the 
WWW network. 



FIG. 2: (a) Conditional expectation of in-degree given the 
out-degree. Each straight line correspond to a different value 
of attractiveness (specified in the graph), (b) Slope and (c) 
Intercept of the type of straight lines shown in (a) as a func- 
tion of the attractiveness for two different values of < D out > . 



in the following equation: 



It is important to note that equations 12 (which in- 



cludes 13 14 1, and 18 (which include 16 1 holds for any 
out-degree distribution (P ou t(k))- These results do not 
depend on the details (shape) of the out-degree distri- 
bution. Nevertheless, there exist some measures that do 
not share this nice property. For example, the condi- 
tional number of out-links given the number of in-links, 
(D ou t /Di n ), depends explicitly on P ou t(k), as can be seen 



(Dout/ D in = k) 



y - ^(k+j+A,3+S) p / *\ 
l^i J ^(j+A.2+8) r out\J) 
3=1 

^ V(h+k+A,3+6) p /,\ 
<Sf(h+A,2+S) ^outV 1 ) 



(20) 



h=l 



Next, we present another measure useful for model se- 
lection. 
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2. Relationship between the distribution tails 



Now, we study the relationship between the tails of 
the in-degree and the out-degree distributions. In the 
case A = 0, if the out-degree distribution has finite 

expectation ((D out ) < oo) and a scale inv arian t tail, 

.-(2 ' 



Pout(k) ~ k- (2+l3 \ it is not difficult (from eq.[ll](b)) to 
see that the limit degree distribution and the in-degree 
distribution have the following tail behavior: 



P(k) ~ P m (k) 



f fc-(2+/J) < < l 

< log(k)k- 3 ft = 1 
\k~ 3 ft>l 



(21) 



Eq. [21] constitute our second main result: if the out- 
degree distribution has finite variance and a scale invari- 
ant tail, Pout(k) ~ k~( 2+ P\ then the limit in-degree dis- 
tribution has also a scale invariant tail, Pi n (k) ~ k~ a . 
Moreover, for < ft < 1, a is equal to the out-degree ex- 
ponent. This last result can explain why in so many real 
networks the in and out power exponents are so similar, 
taking values in a range from 2 to 3. In the case ft > 1, 
a = 3, regardless of the value of ft. For the frontier case 
(finite/infinite variance) of ft — 1, the limit distribution 
decays at a slower rate than fc~ 3 . Precisely, it decays 
as Pi n (k) ~ log(k)k~ 3 . In the general case of prefer- 
ential linking with attractiveness for P ou t(k) ~ fc - ( 2+ ^', 
the regimes are similar to the non-attractiveness case. In 
this case the only difference is that there is now a sepa- 
ratrix curve between them, as it is shown in Fig. [3] The 
behavior is regulated by S = A/E a and ft. For S > 1 + ft 
the limit out degree Pi n (k) ~ k~( 2+l3 \ and in this case 
the (in) degree distribution has exactly the same tail as 
the out-degree, even for large ft. For S < 1 + ft, Pi n (k) 
behaves as k~( 3+s \ Finally on the separatrix curve, 
5 = 1 + ft, the behavior is given by log(k)k~( 3+s \ Note 
that 8 (A/ (D out )) can not be smaller than -1, since (D out ) 
must be (see eq. [JJ greater than -A. 

For out-degree distributions with exponential tails, as 
a geometric, Poisson, or finite range distributions, the in- 
degree distribution satisfies that Pi n (k) ~ k~( 3+s \ even 
for negatives values of S. In [TT] they show that for the 
PRL citation network the out-degree distribution has an 
exponential decay, and the in-degree one has a power law 
tail with a near 3, just as described before for the null 
attractiveness case. We remark the following: a) if the 
model is adequate for describing a real growing network, 
and this network has an out-degree distribution with ex- 
ponential tail, and a scale invariant in-degree distribution 
with a power between 2 and 3, then attractiveness param- 
eter must be negative, and b) if the empirical in-degree 
distribution has a scale invariant tail with a power less 
than 2, then the model presented here is not adequate for 
describing this network. Keeping in mind the last point, 
the new estimations [12 of the in-degree power exponent 
of the WWW network, would rule out the model for de- 
scribing this particular network. 
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FIG. 3: Stationary in-degree probability tail under pref- 
erential linking with attractiveness for an out-degree with 
Pout(k) ~ k 2 +f3 as a function of 8 = and ft. The hor- 
izontal axis corresponds to preferential linking (A — 0). In 
the separatrix curve, 5 = ft — 1, Pi n (k) ~ = l ° | . 



C. Application: scientific publications network 



The scientific publications network has two advantages 
that define it as the most "pure": 1) extremely few dou- 
ble arrows, and 2) all the variability in the number of 
out-links is "intrinsic". These two features guarantee 
that our model (see Fig. [T]) is adequate for describing 
the scientific network. Nevertheless, it is not clear which 
is the attachment law (w) such that we can obtain a good 
mimic of the growing network process. 

Fig. [4] shows the citation distribution for all scientific 
publications published in 1981 from the ISI dataset cited 
between 1981 and 1997 (see [5]). Clearly, this distribu- 
tion represent the in-degree one (see Appendix D). Un- 
fortunately the out-degree distribution (P ou t(k)), i.e. the 
number of references that has a randomly selected paper, 
has not been reported. This makes impossible to test the 
growing model by a plug-in approach (see eq. 11 (c)). 
Nevertheless, we take the following strategy: we suppose 
a geometric out-degree distribution P ou t(k) = p(l — p) k 
with k G N a , a preferential linking on degree attach- 
ment law (eq. [9] with A = 0), and finally we estimate p. 
Probably the empirical out-degree distribution (P ou t(k)) 
does not fall in any family of parametric distributions. 
However, a well estimated in-degree distribution will be 
a positive result, since the in-degree distribution is ob- 
tained as a result of a theoretical computation based on 
the out-degree distribution. In order to estimate p, we 
first compute the average number of citations in the ISI 
network ((cites) = 8.573) and impose the condition that 
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^P(j), -P/^ eo correspond to the theoretical in-degree 

3=0 



i r 

1000 10000 



Number of citations (k) 



FIG. 4: Citation distribution for all papers published in 1981 
(from the ISI) cited between 1981 and 1997. The theoret- 



ical citation (in-degree) curves are calculated by eq. 11 (c) 
assuming that A=0, and the out-degree distribution is geo- 
metric, Pout(k) = p(l — p) for k € N . The dashed line 
correspond to p = 0.104 (T = 0.115), and the solid one 
to p = 0.0817 (T = 0.023) but with P out (0) = 0.3 and 
Pout(k) = 0.7622781p(l - p) k for k G TV. Inset: Difference 
between the empirical cumulative distribution and the theo- 
retical cumulative distribution. Data from |15j . 



(cites) — (references) — kP out (k) = 8.573 we ob- 

k=0 

tain p = 1/(9.573). The dashed line in Fig. [2] corre- 
spond to this case. If we estimate separately the case 
k = 0, and assume that the out-degree distribution is 
such that P out (0) = a, P ou t(k) = cp(l - p) k for k e N 
with c = (1 — a)/(l— p), we obtain p = (1 — a)/8.573 after 
taking the mean value condition. Curiously, for a = 0.3 
(p = 0.0817) the theoretical in-degree probability (solid 
line) is extremely similar to the empirical one in all the 
range of the distribution, which can not be achieved with 
an oversimplified model where P ou t(k) = 8k=m- This is 
not the only P ou t(k) that fits perfectly well, hence we 
do not assert that the estimated P ou t(k) must be similar 
to the real cites distribution. Moreover, the estimated 
Pout(k) does not seem very adequate, since under this 
probability distribution 30% of all scientific publications 
do not contain any reference (yet, note that in |10j it was 
reported that 10% of all publications do not contain any 
reference) . 

In order to have a better notion of the goodness of 
fitness we compute the Kolmogorov statistic, 

T = max\G(k)\ = max\Fs (k) - F pth eo(k)\ (22) 

keN keN 

where Fp(k) is the cumulative distribution, Fp{k) — 



distribution showed in eq. 11 (c) assuming a particular 
Pout(k), and Pi n correspond to the empirical citation dis- 
tribution. One advantage of the proposed estimator in 
eq. [22] is that it is possible to test whether the model (in- 
cluding the attachment law) is adequate for describing 
the real network. In our application, the null hypothesis 
is H a : the real growing network has an underlying link 
attachment law that is preferential on degree. For the 
simplest case where T compares an empirical distribution 
with a theoretical one, but without estimating parame- 
ters, the null hypothesis will be rejected (at a 0.05 level of 
significance) only if T > 0.0015. In the case shown with 
solid line T = 0.023, and for the case where P ou t(k) is 
geometric (dashed line) T — 0.115. Clearly, T is a good 
measure for ranking models (or model selection) . The in- 
set of Fig. [4] shows the function G(k) for both out-degree 
distributions proposed, for the geometric (dashed line) 
case the maximum distance between the cumulative dis- 
tributions (see eq. 22 1 occurs at k = 0, and for the other 
case (solid line) at k — 10. 

As we mentioned at the beginning of this section, the 
model is adequate for the scientific publication network, 
but the attachment law is completely unknown. We have 
proposed one, preferential linking on degree, but we do 
not have the possibility to corroborate it. This is one of 
the reasons why we are going to study the model under 
different attachment laws. The only weak argument in fa- 
vor of the law given by eq. [3] is that review papers, that 
have a huge number of references, are typically highly 
cited compared with regular articles that have a small 
number of references. In this way, the correlation be- 
tween Di n and D out will be positive, which is a virtue of 
the law defined in eq. [3j 



D. Different attachment laws 

Clearly, it may happen that for a real network the in- 
formal checks (covariance, variance and conditional ex- 
pectation) discussed before might be not consistent with 
the observables of the model. In this case, three things 
may be happening: 1) the link attachment law is not ade- 
quate, 2) the model is not correct, or 3) both before. The 
first point is related to the mechanism of linking: pref- 
erential, uniform, non linear preferential, or may have 
some age dependency as described in |16l 117] . The sec- 
ond point correspond to the growing mechanism, that 
can be seen as the core of the model. For example, up- 
dating of nodes, or a very high proportion of double links 
can be present, that are not considered in the model. In 
this section we discuss only the alternative where the at- 
tachment law is different from the one proposed in eq. [3] 
(preferential linking on degree), but the core of the model 
remains true. 
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1. Preferential linking on in- 



In [5] they studied a model where the attachment law 
depends on the in-degree and on the attractiveness. The 
proposed law was the following: 



(A + k)N, 



E(A + j)Nl 
jeN 



(23) 



where N^ n is the number of nodes with in-degree equal 
k. In principle, this can be a good law for the scientific 
publications network. The joint attachment law in this 
case is given by: 



3,k 

T 

deg,out 



j - k + A 



(A 



A 



Pdeg,out(j,k), (24) 



where we have used that (flj„) = E kPi n (k) — (D out 



k=0 



Replacing eq. [24] in eq. [8j it is very easy to compute the 
stationary probabilities: 



Pin(k) 



*(fc + A,2 + 8) 



(a) 



1 

P{k) = 9(A, 1 + S) E P °*C?)tt(fc ~ 3 + A 2 + 8) (b) 

Pin, out 

(j, k) = P d eg,out(j + k,k) = P in (j)Pout(k) (c) 

(25) 

where k,j € N . This case is specially easy to 
solve because, for a randomly selected node, the num- 
ber of out- links (D out ) and the number of in- links 
(Di n ) are independent random variables (Pi n ,out(k, j) — 
Pin(k)P ut(j))- This mean: 



r = (a) 
(D m /D out = k) = (D out ) (b) 
(D out /D in = k) = (Dout) (c). 



(26) 



One big difference between the previous attachment law 
(eq. [3| and this one (eq. 23) is that Pi n (k) depends 
only on the mean number of out-links ((D out )) by 8 
(8 = A/ (D out )), and not on the shape of the out-degree 
distribution (P out (k)). For A > and k » 1, P in (k) 
behaves as k~^ 2+s ^ no matter which is P ou t(k) (only de- 
pends on (D ou t))- Therefore, under the attachment law 
given by eq. 



23 



the tail of the out-degree distribution 
does not give any information about the tail of in-degree 
distribution, contrary to what happens for the law of 
eq. [3] In addition, for this new attachment law the cor- 
relation between Di n and D out is zero (eq. 26 (a)), and 
the conditional expectation of D in (D ou t) 

given D ut — 

(A- 



k) does not depend on k (eq. 26 (b) and (c)) 



Note that 



in eq. 



23 



is well defined only for posi- 
tive or zero values of attractiveness. But, only strictly 
positive values of A are interesting, since for A = we 



get that the stationary probability is Pi n (k) — 8k=o- This 
last result is easy is to understand: new nodes appear but 
they can not be pointed by other nodes (A — 0), and in 
this way the network will be formed by almost all nodes 
with zero in-links and only a few (given by the initial 
condition of the network) with many in-links. Clearly, in 
the limit n — > 00 the proportion of nodes with fc in-links 
goes to a delta function (<5fc = o)- 



2. Uniform attachment law 

It is thus clear that even when preferential linking is 
an accepted mechanism of link attachment, it is neces- 
sary to study [TBI EH] alternative types. For the uniform 
attachment law on degree: 



E Nj 

3 <£N 



(27) 



7T 



deg,out 



Pdeg,out (tI, k) 



by means of the same technology (replacing tt^' out in 
eq. [8| we obtain: 



Pin (k) 



3=0 

■ / (Dout) sk 



(28) 



1 + (D^) 1 + (Aw 



Note that, Pi n (k) depends only on (D out ) (and not 
on Pout(k)), and decays exponentially fast. For an 
out-degree with P out (k) ~ k~( 2+l3 \ P(k) behaves as 
fc-( 2 +£) f(k)^ 1 , where f(k) is an increasing function of 
k that grows more slowly than log(k). It is important to 
remark that for empirical (finite) networks, the /(/c) _1 
term will be very difficult to discriminate (f(k) grows 
at a rate slower than log(log(k))). This behavior may 
be hard to "separate" from P(k) ~ fc~( 2+/3 ', but the in- 
degree distribution will sort out any possible confusion 
about the link attachment law. 



III. CONCLUSIONS 

For the model presented here, we showed a simple way 
to compute the stationary probabilities. This model was 
constructed in order to take into account the main fea- 
tures of real directed growing networks with the prop- 
erty that almost all the variability in the number of out- 
links is "intrinsic" (see Section 2). From the station- 
ary Property, we showed how to compute the stationary 
joint in-out degree distribution for an arbitrary out de- 
gree distribution, and arbitrary link attachment law (n). 
We studied three different 7r's, paying special attention 
to the preferential linking on degree with attractiveness 
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mechanism (71^ = 



E (A+j)N:, 

probability, we compute: 



-). Once obtained the joint 



(1) Pi n (k) as a function of P ou t(k). 

(2) The correlation between D in and D out . 

(3) The conditional expectation of Di n {D out ) given 

Dout(Din)- 

From Pin{k) we studied the relationship between the 
distribution tails, giving a possible explanation for the 
in/out degree tail relationship reported for many real net- 
works. The statistical measures mentioned in (2) and (3) 
were studied for the WWW network, obtaining a good 
agreement with some of the analytical results presented 
in this paper. Nevertheless, we cannot say that the model 
is appropriate to describe this network (an important 
part of the variability would be not "intrinsic" ) . 

Finally, we showed an application to the scientific pub- 
lications network. In this network: 

(a) New publications continuously |21j appear (grow- 
ing network) and do not disappear. 

(b) The structure is rigid. Published papers cannot 
change their references, only new papers can change 
the number of citations of already published works. 

(c) The publication that is forthcoming has a non pre- 
dictable number of references, D out (random vari- 
able) 

(d) Even knowing D out , the cited papers by the forth- 
coming publication are unpredictable (there is a law 
of attachment, tt). 

The model we proposed considers the four points men- 
tioned above. The main difference with other mod- 
els, is that the number of out- links (references) of a 
new node (paper) is treated now as a random variable. 
Therefore, if the distribution of the number of references 
(Pout(k)) is known, an important part ((a),(b) and (c)) 
of the scientific network will be well described by the 
model. But, the distribution of the number of references 
of the forthcoming publication (out-degree distribution) 
has not been reported. In addition, the attachment law 
((d)) of the scientific publication network is completely 
unknown, and difficult to estimate it. Thus, we proposed 
a simple out degree distribution (geometric) and an at- 
tachment law of preferential linking on degree (we also 
consider preferential linking on in-degree and uniform at- 
tachment). With these two assumptions, we found a very 
good fit. This application also served to discuss how to 
compare various models. In this matter, we proposed a 
measure (eq. 22 1 frequently used in statistics to compare 
two distributions. 

From a modeling point of view, we see our results as 
a further step from which more complex models may be 
built in order to be closer to reality. The model can 



be seen as the skeleton to construct more sophisticated 
models. For example, it does not seem difficult to in- 
corporate in the model double links (a mixed out-links 
distribution) in order to be closer to the metabolic net- 
work, or some updates in the nodes to mimic the WWW 
network. Other important issue to explore is what hap- 
pens when P ou t(k) depends on time in a simple para- 
metric way. This last point is related with accelerating 
networks I2TH. 
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APPENDIX A: COMMENTS ON THE MODEL 

Being rigorous, the model as it was presented in Sec- 
tion 2 is not well defined. Yet, as we discuss in this 
appendix, this is not a serious problem (all the results 
presented before hold). The difficulty is that P ou t(k) is 
any probability distribution. In particular, it includes 
the ones that take infinitely values (such as geometric, or 
any one with exponential or power law tails) . The prob- 
lem can be stated as follows: if a new node, for example 
has 1000 links and the network has 100 nodes, what do 
we must do with the remaining 900 links?. 

We describe below the correct form of the model (that 
can be implemented): 

(1) Initially the network consists of n nodes connected 
in a given arbitrary way. 

(2) At each time step starting from ra+l, say time step 
m, a node with £>™ t outgoing-edges appear. t 
is a random variable with law Q™ ut (k) (QZt(k) = 
P(DZ t = k), and £ P(DZt = *) = !)• 

feGN 

(3) Each new directed edge points out to an existing 
node with some probability law 7r m (uniform, pref- 
erential linking, etc.). 

The distribution of the number of out-links from a new 
node at time m (the networks has m — 1 nodes) is defined 
by the following equation: 

QZM = P(D ut = k/D out < m). (Al) 

Qout(k) is the conditional distribution of D out given 
Dout < m — 1. From definition |A1| is very easy to see 
that Q™ t (fc) converge to P ou t(k), 

Urn QZ t (k) = P out (k), (A2) 

rn — >oo 

as the network grows, where P ou t(k) is the distribution 
defined a priori (see Section 2). From this last conver- 
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gence we can see that the model with this correction 
(we have only changed P ou t(k) by Q™ ut {k)) has exactly 
the same asymptotic behavior that was obtained for the 
model presented in Section 2. Therefore, all the results 
presented in this paper also hold for the corrected model. 
The general conclusion would be: "small effects disappear 
at oo". See, for instance Section 2.4.1 were we discuss 
why for A=0, Pi n (k) converges to Sk=o- 



APPENDIX C: WWW NETWORK 

As we have mentioned in the Section 2.2.1, it is difficult 
to find articles on networks that report the simple de- 
scriptive measures (covariance, variance and conditional 
expectation) for nodes discussed here. However, a de- 
tailed statistical analysis of the topological properties of 
four different WWW networks have been reported re- 
cently 12J. In rjj] the covariance and the variance of the 
number of out-going links (D out ) and in-going links (An) 
are reported, which we give in Table 1. The first thing 



APPENDIX B: A CLOSED EQUATION FOR P(k) 



If we were only interested on the stationary degree dis- 
tribution (P(k)), the computation is much easier than the 
one presented in Section 2.1, since there is a closed equa- 
tion for P(k). The growing network dynamics is given 
by: 



N k 
Jv n+1 



N 



At=Sr 



Yi=k-1 



J Y,=k 



(a) 
(b) 



(Bl) 



where {Yi}i<k<n is a sequence of independent and identi- 
cal distributed random variables, taking value k (k e N) 
with probability 7r^ +1 . 



Property: P = (P(1),P(2), 
lution of: 



(A*/ 



Em 

feSN 



P) = P{k) 



, P(k), . . . ) is the so- 



Vfc € N. 



Replacing A k by eq 

{$D out =k 



Bl 



(b) in eq. B2 we get 



-Dout 



N„ 



Em 

keN 



(B3) 

From this last equation it is trivial to obtain that the 
stationary degree probability satisfies: 



P(k) = P ut{k) + (tt 



k-l 



TT k )(D out ) 



(B4) 



where ir k is the stationary probability that a new link is 
attached to a node with degree j. Under preferential link- 
ing on degree linking with attractiveness, the stationary 
attachment law, Tr k , remains equal to ^pyy+x^- Replac- 
and using (D) = 2(D out ), it is easy 



B4 



mg tt in eq. 

to conclude that the limit degree distribution (P(k)) is 
given by eq. |B5| 



P(k) = *(k + A,3 + 6)J2- 



Pout(j) 

(j + A,2 + 5)' 



(B5) 





Cov(D in ,D out 


) Var(D out 


) Var(D in ) 


WBGC01 


155.682 


171.61 


40080.04 


WGUK02 


524.244 


750.76 


20534.89 


WBGC03 


348.486 


870.25 


54980742 


WGIT04 


3478.75 


4502.41 


776866 



TABLE I: Descriptive statistical measures for 4 WWW net- 
works. Data from 1121. 



that can be noted is that for all the domains studied 
Var(D out ) < Var(Di n ), consistent with eq. 14 More- 
over, Cov(D in , D out ) and Var{D out ) have similar values 



(consistent with eq. 12 1, the relative differences seems 
large only for WBGC03. In order to compare in a bet- 
ter way these last two quantities, Table 2 shows r and 

R = \/ Xr^p^v for the same data. We can see that 
WBGC01 and WGIT04 have very similar values of r and 



R (see eq. 13 ). In order to study the relationship between 



(B2) 




r R 




WBGC01 


0.0594 0.0654 




WGUK02 


0.1335 0.1912 




WBGC03 


0.0016 0.004 




WGIT04 


0.0588 0.0761 



TABLE II: Correlation (r) and R for 4 WWW networks. Data 
computed from Table 1. 



(Di n / Dout) and D out is necessary to have the complete 
data. At this point, we analyze the WWW data obtained 
from [T3] presented in [2]. We built up a database with 
the information of the number of out-links and in-links 
((D out , Di n )) for each of the 325729 nodes. In order to 
have a good estimation of the conditional expectation, 
we first restrict the study to the values of D out such that 
there exist at least 500 nodes. Fig. [5] (a) shows the rela- 
tionship between D out and the conditional mean of Di n 
((D in /D out )) given D out . Interestingly, there is a strong 
relationship between both. For values of the D out smaller 
than 20 there is a clear linear relationship between them. 
A robust regression (least median of squares) estimation 
between (D in /D out ) and D out gives a slope of 0.523 and 
an intercept of 1.739. In the case D out is greater than 20 
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it seems that (-Dj n / D out ) grows faster than linear, but it 
is not clear if this effect is real (based on Fig.[5](b)). The 
graph presented in Fig.[5](b) is similar to the one in (a), 
but now we study the values of D out such that there exist 
at least 30 nodes. A plot of two different representations 
of the joint in-out distribution is given in Fig [5] (c) and 

(d) , to have an idea of the shape of the joint law, while 

(e) shows a scatter plot on a larger grid. Besides, the in- 
degree variance (Var(D in ) = 1346.85) is greater than the 
out-degree one (Var(D out ) = 461.25), consistent with 
eq. 14 Fig [5] (f ) shows the conditional standard devia- 
tion of An given D out , a m / out = ^Var(D in /D out ). Un- 
like the conditional expectation, the conditional variance 
does not seem to have any relationship with D out . 

In [Hj the authors showed the empirical out de- 
gree (Pout(k)) and in degree (Pin(k)) distributions (see 
Fig. |6j, and reported a power exponent of 2.45 for out- 
degree and of 2.1 for the in-degree [25]. This is the first 
empirical evidence that the model presented here can 



not describe in a good way the WWW network, in the 
model the power law exponents are equal. The second 
evidence is that r and R are not similar, r = 0.2244 and 
R = 0.5852. 



APPENDIX D: COMMENT ON THE 
SCIENTIFIC PUBLICATION NETWORK 

In the scientific publication network it is implicit that 
we are under the hypothesis that the citation distribu- 
tion for all papers published in 1981 can be treated as 
the stationary in-degree distribution of a growing net- 
work model. But, why can be treated in this way only 
studying the papers of a particular year (1981)?. This 
is just because: if the total scientific network has arrived 
(today in 2007) to a proportion of papers with k citations 
that do not change with time (stationary) , then the arti- 
cles published in 1981 are a sample of this distribution. 
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FIG. 5: Conditional mean of Di„ given D out , when for each 
value of D ou t there exist at least: (a) 500, and (b) 30 nodes. 
Data presented as a confidence interval of 95%. (c) and (d) 
Different representations of the joint in-out density of the links 
in a node, (e) Scatter plot of Di n as a function of D out . (f) 
Conditional standard deviation of Di„ given D ou t, a in / out . 
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FIG. 6: Pout(k) and Pi n {k) as a function of k+1. This graph 
was presented in [14]. 



